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Abstract: We study the rate of Bayesian consistency for hierarchical pri- 
ors consisting of prior weights on a model index set and a prior on a density 
model for each choice of model index. Ghosal, Lember and Van der Vaart 
[2] have obtained general in-probability theorems on the rate of conver- 
gence of the resulting posterior distributions. We extend their results to 
almost sure assertions. As an application we study log spline densities with 
a finite number of models and obtain that the Bayes procedure achieves 
the optimal minimax rate n r/( 2 7+l) of convergence if the true density of 
the observations belongs to the Holder space C 7 [0, 1]. This strengthens a 
result in [1; 2], We also study consistency of posterior distributions of the 
model index and give conditions ensuring that the posterior distributions 
concentrate their masses near the index of the best model. 
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1. Introduction 

Selection of models plays a key role in theory of density estimation. Given 
a collection of models, from the Bayesian point of view it is natural to put 
a prior on model index and let the resulting posteriors determine a correct 
model. A rate-adaptive posterior achieves the rate of convergence provided by 
the best single model from the collection. This paper handles adaptation for 
density estimation within the Bayesian framework. Suppose that we observe a 
random sample Xi, X 2 , ■ ■ ■ ,X n generated from a probability distribution P 
with a density function / with respect to some dominated cr-finite measure on 
a measurable space X. Let I n denote an at most countable index set for each 
positive integer n. For 7 £ /„, ^1,7 stands for a subset of the density space 
F equipped with a cr-ficld such that the mapping (x, /) t— > f(x) is measurable 
relative to the product cr-ficld on X x V n .j- Let n„ i7 be a probability measure 
on Vn,-y and let {A Ilj7 : 7 G /„} be a discrete probability measure on /„. One 
can therefore define an overall prior n„ with support on U T g/ n 'P ni7 C F by 

ILi ^ ^ ^71 ,f ,f • 
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The corresponding posterior distribution n„ (• | Vi, V2, . . . , X„) is a random 
probability measure with the expression 



n„ [A J x 1 , X2 , ■ ■ ■ , x ri 



J,n/W)n.W j^n,,^/) 



n 

for all measurable subsets AcF, where R n (f) = Yi {/(-^iV/oC^i)} denotes 

i=l 

the likelihood ratio. The posterior distribution | Xl, . . . , X„) is said 
to be consistent almost surely (or in probability) at a rate at least e„ if there 
exists a constant r > such that H n (f '■ /o) _t r £« | X\, X2, ■ ■ ■ , X n ) — > 
almost surely (or in probability) as n — > 00. Throughout this paper we assume 
that d is a distance bounded above by the Hellinger distance and d(f, f\) s is a 
convex function of / for some positive constant s and any fixed /1 . Almost sure 
convergence and convergence in probability should be understood as to be with 
respect to the infinite product distribution of the true distribution fb- 

The purpose is to deal with the following problem: assume that for a given 
density /o there exists a best model V n ,(3 n equipped with a prior H n ,f3 n such that 
the optimal posterior rate is s n ^ n . Find conditions ensuring that the posterior 
distributions of the hierarchical prior H n achieve the same rate of convergence 
as we only use the best single model IL n ,0„ for this /o. Ghosal. Lember and 
Van der Vaart [1; 2] have studied adaptation to general models and obtained 
in-probability results on convergence rate. See also Huang [4] and Lember and 
van der Vaart [5] for related work on Bayesian adaptation. When applying to log 
spline density models, Theorem 2.1 of [2] leads to adaptation up to a logarithmic 
factor and it was shown in [2] that the additional logarithmic factor in the 
convergence rate can be removed by choosing special prior weights A Ili7 when 
/„ are finite sets or the priors n„ 7 are discrete. Our main goal in present paper 
is to extend work of Ghosal et al. [2] and establish the corresponding almost 
sure assertions. With an application of our theorems to log spline densities with 
finitely many models, we successfully take away the logarithmic factor without 
using the special prior weights A„. 7 and hence for a true density in C 7 [0, 1] the 
posteriors attain the optimal rate of convergence in the minimax sense, which is 
well known to be n _7 ^ 27+1 - ) . This strengthens Theorem 5.2 in [2] and Theorem 
2 in [1]. A related problem is model selection, for which we establish an almost 
sure result on consistency of posterior distributions of the model index. 

We shall use the Hellinger distance H(f,g) = ||-\/7~ \f9W2 and its modifica- 

tionff.(/, fl ) = ||(V7-Vfl)(§ ^+l) 1/2 \\ 2 ,whcrc\\f\\ p = (J x \f(x)\r v(dx)) 1/p . 
Observe that H*(f,g) ^ H*(g,f), see [9] for properties of H*(f,g). Denote 
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W nn {e) = : ff*(/ , /)<£>, 

A nn (e) = {f eVn,-,: d(f J)<e}. 

Throughout this paper the notation a < b means that a < Cb for some positive 
constant C which is universal or fixed in the proof. Write a sw b if a < b and 
b < a. For a measure P and an integrable function / on X, we let Pf stand for 
the integral of / on X with respect to P. The notation N(S, G, d) stands for the 
minimal number of balls of radius 6 relative to the distance d needed to cover a 
subset Q of F. 



2. Adaptation and Model Selection 

Denote by e„ j7 the usual optimal convergence rate of posteriors by using the 
single model V n ,-y with the prior II, lj7 . We shall use a partition /„ = I„ U I* 
with 

In = {7 G /„ : e„, 7 < V~He ni (3 n } and 1^ = {7 € /„ : £„, 7 > VHe n ,0„}, 
where H is a fixed constant > 1. 

Theorem 2.1. Suppose that there exist positive constants H > 1, £7 7 , /Lt„ i7 , G, 
J, L, G and < a < 1 sitc/i £/ia£ 1 — a > 18aL, ne^ ^ > (1 + ^)logn, 

sup 7G7 i £ 7 4 >7 < Ge 2 n Pn , sup 7G7 2 E 7 < G and J2 je i n < 7 - 0(e Jr <fc)- Lei 
r be a constant with r > 18i - c + J + G +^ a + 2aC ) _|_ y^/y _|_ ^ swc/i £/ia£ 

(1) JV(§,Ai l7 (2e),d) < e^™ 6 ™^ for all 7 G J n and e > £„, 7 , 

A„ n„, 7 (^, 7 ae„, T )) < ^n<, for all 7 G and j > r, 

(3) A„, 7 n„4A Oe ,j) ^ for all 7 6 /lander, 

V*3» n„. /3 „ (w„,3 n (£»^ n )) 



( 4 ) E E : — 4; /,„ ; < °°- 



The 



n„(/ : d(/,/ ) > re„ >Al | Xi,X 2 , . . .,X n ) — ► 

almost surely as n — ► 00. 

Clearly, it is enough to assume that all inequalities in Theorem 2.1 hold for 
all sufficiently large n. As a direct consequence of Theorem 2.1, we have 



Y. Xing/On adaptive Bayesian inference 



851 



Corollary 2.2. Suppose that there exist positive constants H > 1, _E 7 , ^„ i7 , G, 
J, L, C, F and < a < 1 suc/i that 1 — a > 18aL, ri£^ ^ > (1 + gr) logn, 

sup 7£7 i £ 7 4 j7 < Ge^, sup 7S7 2 E 7 <G and £ 7 e/„ = 0(e' /,le ".' 3 « )• iei 
r be a constant such that r > ls ( c + J + G +^ a + 2aC ) _|_ ^Tp[ _|_ ^ a n rf 

— 1 — a— 18aL v 

(1) AT(|, A„ i7 (2s), d) < e^ n£ ™^ for all 7 G /„ and e > e„ j7 , 

(2) ^x. < n n 7 e (i-*>«,v e 2 „ , s „) for all 7 £ Jn) 

(3) E ^ff n n , 7 (^ n , 7 (r £ „ )7 )) = 0(e-( 3 + 3C '+^)-»,,„) i 

(4) n„^„ (W„ i/Jb (e„ >/3 „)) > e~ Fn <^ . 
Then 

n„(/: d(f,f )>re n ,p n \X 1 ,X 2 ,...,X n ) — ► 

almost surely as n — > 00. 

Condition (3) of Theorem 2.1 leads adaptation up to a logarithmic factor for 
log spline density models, see [2] for the corresponding in-probability assertion. 
The following theorem is useful to remove the logarithmic factor in some cases. 

Theorem 2.3. Theorem 2.1 holds for r > mc+J+G+3aK+2 a CK) ^ 1 
the condition (3) of Theorem 2.1 is replaced by the condition that there exists a 
constant K > 1 independent of n, 7, j such that 

(3') n - 7 / A "' T(j£ "' g " ) \ < fi ni e Lj2n <0n for all 7 e £ and j > r. 

Now we consider the rate of convergence of posterior distributions of the index 
parameter 7. Given a subset / of /„, Ghosal ct al.[2] introduced the posteriors 

TT (T\V V Y\ - E 7 £/ ^™,7 Jp„ 7 ^"(/) n «,7( d /) 
ll„(i Ai,A 2 , . . . , X n ) — — j. — -. 

Clearly, the result of Theorem 2.1 implies that 

n„(7 G /„ : d(f 0} V nn ) > re n ,p n \ X U X 2 , . . .,X n ) — ► 
almost surely as n — > 00. Moreover, we have 

Theorem 2.4. Under the same assumptions of Theorem 2.1, we have that 
n„ (-L I X\ , X2 , ■ ■ • , X n ) — > 
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almost surely as n — > oo. If furthermore for I 3 = {76 I n : \fHe n ^ < 
we have that 



y> A„, 7 H n ^(A n ^re n , J) e ( 3 + 2C )" g ^ 



< 00, 



n„(7 G I n : —j=E n ^ n < e n>7 < VHe n ^ n \Xi,X 2 , . . . ,X„ 

V 



then 



almost surely as n — > 00. 

Since is an arbitrarily given constant bigger than 1, Theorem 2.4 states 
that the posterior distributions of model index concentrate their masses on 
the indices of those models which have approximately the same convergence 
rate as the correct rate £ n n - ^° Theorem 2.4 can be considered as a general 
convergence theorem on posterior distributions of model index. 

In the situation that there are only two models, one can use the Bayes factor 
to describe behavior of the posterior of the model index, see [2] . Denote by BF n 
the Bayes factor, that is, 



FB n := 



^n,2 J Vr 


2 iU/)n„, 2 (<#) 


n„({2} 


X\, X 2 , ■ 


■ ■ , x n ) 




tiJ R„(/)n„ 4 (d/) 


n„({i} 


| X\ , X2 , ■ 


■ .,x n ) 



Corollary 2.5. Suppose that condition (1) of Theorem 2.1 holds and that e n ^ > 
e n>2 > x/(T+ 1/C) (log n)/n for alln and some C > 0. Let r > 700(2C + G + 2). 

(i) //n„ >2 (W„ j2 (e„, 2 )) >e~ n <\ ^Tl nA (A nA (re nA )) 

= 0( e -^ +3C)n <^) an d £lI < e n <i , then BF n -► 00 almost surely. 

(ii) Ifn n<1 (W n>1 {e n<1 )) >e-"<S ^n n , 2 (A n , 2 (r£ n ,i)) 

= 0(e~ (4+3C> <i) and ^ < e nE ™.i, i/ien BF„ aWsi swrefy. 

Proo/. Take H = J = F = 1, L = 2 and a = 1/38. Then 1 - a > 18aL. 

(i) Let j3 n = 2. Then l\ = {2} and = {1}. it follows then from the first 
assertion of Theorem 2.4 that the denominator of the Bayes factor BF n tends 
to zero almost surely as n — > 00 and hence BF n — > 00 almost surely. 

(ii) Let p n = 1. Then l\ = {1, 2}, I 2 n = and I 3 = {2}. It follows then from 
the second assertion of Theorem 2.4 that the numerator of the Bayes factor BF n 
tends to zero almost surely as n — ► 00 and hence BF n — > almost surely. The 
proof of Corollary 2.5 is complete. □ 



3. Log Spline Density Models 



Log spline density models were introduced by Stone [7] in his study of sieved 
maximum likelihood estimators, and were developed by Ghosal, Ghosh and 
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Van der Vaart [3] to Bayesian estimators. Assume that Uk — l)/K n ,k/K n *\ 
with k = 1, 2, ... , K n is a given partition of the half open interval [0, 1). The 
space of splines of order q relative to this partition is the set of all functions 
/ : [0, 1] h R such that / is q — 2 times continuously differentiable on [0, 1) 
and the restriction of / on each Uk — 1)/K n , k/K n ) is a polynomial of de- 
gree strictly less then q. Given 7 > 0, denote J nn = q + K n — 1 where q is a 
fixed constant > 7. The space of splines is a J„ j7 -dimcnsional vector space with 
a basis B\(x), B2(x), . . . ,Bj n (x) of B-splincs, which is a uniformly bounded 
nonnegativc function supported on some interval of length q/K n , see [3] for 
the details of such a B-spline basis. Assume throughout that the true density 
fo(x) := fe (x) is bounded away from zero and infinity in [0,1]. We consider 
the J„ j7 -dimensional exponential subfamily of C 7 [0, 1] of the form 

/ e (x)=cxp( ^^(aO-cp)), 
j'=i 

where 9 = {9 U 9 2 , . . . , Jn „) £0„ = {(6i,0 2 , 0j n „) G : ^=1 0j = °1 

and the constant c(9) is chosen such that fe(x) is a density in [0, 1]. Each prior 
n„ 7 on Go induces naturally a prior n„ 7 on the density set Vn.-y ■— {fe{x) : 
9 G Oo}- Assume that J nn ps _ftT„ « ri 1 /( 2 T+ 1 ) and assume that the prior n„ !7 for 
Oo is supported on [-M, M] Jn '~> for some M > 1 and has a density function with 
respect to the Lebesgue measure, which is bounded on [— M, M\ Jn -i below by 
d Jn -i and above by D Jn <~> for two fixed constants d and D with < d < D < 00. 
Write 11% = (E/=T l^l p ) 1/P and \\fe(x)\\ p = (J f e (x)Pdx) 1/p for 1 <j < 00. 
Take constants C x > ^ > such that C_ x \\9\\ x < [ | log / e (^) 1 1 00 < C'i_||0|| oo 
for all 9 S @o, see Lemma 7.3 in [2] for existence of C_ x and C\. Hence e~ ClM < 
f g (x) < e VlM for all 9 e 6 with ||6>||oo < M. Ghosal et al.([3], Theorem 4.5) 
proved that, if / G C 7 [0, 1] with q > 7 > 1/2 and 1 1 log f (x) \ \ 00 < C 1 Af/2, the 
posteriors are consistent in probability at the rate n~' 1 ^ 2 ' 1+1 \ This result has 
been strengthened by Xing [9] to the almost sure consistency of the posteriors. 

For given priors n„ j7 on densities and a discrete prior {A„ !7 } on regularity 
parameters 7, we get an overall prior II„ on densities as before. Under mild 
conditions, Ghosal et al. [2] obtained an in-probability theorem on adaptation 
up to a logarithmic factor for the posteriors. They also showed in [1; 2] that the 
logarithmic factor can be removed by choosing special prior weights A„. 7 cither 
when /„ are finite sets or when all the priors n„ i7 are discrete. Now, for finite 
index sets /„, we can take away the logarithmic factor without using the special 
prior weights An i7 and our result moreover is an almost sure statement. 

Following [2], we consider prior weights A„ j7 = A 7 > for all n and 7 G /„ := 
{7 G Q + : 7 > 71}, where 71 is a known positive constant strictly bigger than 
1/2. Now we prove 

Theorem 3.1. Let I n = {71,72, ■ • ■ ,7w} and e„ j7 = n~ 7 /( 27+1 ) for all 7 G I n . 
If fo G C^[0, 1] with some [3 G I n and \ \ log/o(a:)||oo < Q.\ M, then for all large 
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constants r, 

n„{/ 9 : II/9-/0II2 >re Bl/J |Xi,...,A»} — >0 
almost surely asn^oo. 

Proof. We shall apply Theorem 2.3 for the Hellinger distance to the proof. 
Observe first that ne 2 n ^ — n 1 /' 2 ^ 1 ) > (1 + 1/C) logn when n is large enough 
and C = 1. Take fi nn = A 7 /A^. Conditions (1) of Theorem 2.3 has been verified 
in [2]. Denote 

e o ,M = {0ee o : ii^iu <m}, 

Cj n Je) = {fg: H(f 8 , /o) < s and G 6 ,m}, 
Wj n Je) = {f e : H*(f e , f ) < e and G 6 , M }. 

Since /o//e are uniformly bounded above by e^ Cl+ — and below by e~( Cl+ — ^ M 
for all 8 G 6o,m, we have 

W Jnn (e/B) c Cj Bi ,(e) C W> Bi ,(Be) 

for B = e^ Cl+ — i) M / 2 . Hence, applying Lemma 7.6 and Lemma 7.8 in [2], one 
can find four positive constants A 11 A 2 , A\ and A2 such that for all large n and 
all e > 0, 



n„, 7 (0/„, 7 (V)) < n„ i7 (^G6 ,M: ||fl-ej n ,,||a<AiV^'e) 

and 



n„,0(Wj„ !/3 (£ ra>/ 3)) > U n>0 ( 6> G e ,Af : \\0-6j n J\ 2 < A ly /Jn^e n ,p) 



where n„ j7 is the corresponding prior of I±„ j7 and 8j n minimizes the map 
9 1 ^ H(fe,fo) over 9o,m- In fact, Lemma 7.6 of [2] yields the first inequality 
for < e < l/A 1 . However, since ||$||oc ^ for 6 £ @o,Af and Jn,^ — * oo &s 
n — » oo, the inequality holds even for all e > l/M^ and large n. It then follows 
from v^£„, 7 w nV^+i) n -7/(*r+i) = n (i-2 T )/2(2 7 +i) for all 7 g j n and 

hence for 7 = /? that 



IL, 7 (C,^(je„, 7 )) (j^™) 



^ (l-2 7 )J» ;7 (2/3-l)J n ,ff ^ _ i a T . -r , 7 . . 

ea:P V^ 2(27+1) 2(2/3 + 1) ) S " n ' 7 g - 2 ~ g 2 ™' 7 gJ 

Now, for 7 G 7^ we have that e„ j7 > VHe n ,p > £ n .p which implies 7 < (3 and 
•4,7 > #Vn,/3- Therefore, using 4( 1 2 ~^ 7 1) < 4( 1 2 ~ 2 +i) < for 7 > 71 > 1/2, we 
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get that for large n the exponent in the right hand side of the last equality does 
not exceed a constant multiple of the following sum 

/ 1 - 27 2/3 - 1 1 \ 

J ^ log Hl(2>Ti) + WTT)h) + J ^ logJ 
1 — 2 2 

< J ".7 lo g n 5 ^2 7l + 1) + J ".7 lo g.? ^ ^n, 7 logj < Lj ne nrf , 

where L is any given positive constant, the second inequality holds for a large 
constant H depending only on 71, and the last inequality follows from J nn m 
ne\ 7 and j > r with a large r. Hence we have verified condition (2). Similarly, 
since ne^^/H > ne„ ^ for 7 6 1^, we have that for some M\ > and large 
H, M 2 > b, 

J2 J- A "-7 n "-7(g./,.,(^n.7))e (3+2) " E "^ 



n=M 2 7 e/2 A «./3 u n,0(Wj nt0 (e n , )) 

00 , , . 

n=M 2 7 e/2 P 



< V ^e" e "^ lo8 " Ml 8 <^i+V * T 

n=M 2 7 ei3 



00 . 

n=M 2 7 G/J ^ 
00 . 00 

n=M 2 yel„ P n=M 2 ' 

which yields condition (4) for (7 = 1. Finally, observe that e„ i7 11, p n 

for all 7 € i^. Since 7* contains at most finitely many indices and e n>1 is the 
convergence rate of the model Vn,-y for /o, there exists a constant K\ > 1 such 
that n ni7 (Cj (-Ki£ ni/ g)) > for all 7 G and all large n. It then follows 
from Wj n (e/B) C Cj n (e) C VFj^ (-Be) and Lemma 7.6 in [2] that for a large 
K > K u A 3 and A 3 , 

n n , 7 (Cj„ |7 (j£ n ,/3)) < (jj 3 J y / J«^ £«,/?) J "' 7 < e J„. 7 log^ < ^fne 2 ^ 

for all large j and any given L > which yields condition (3'), and therefore by 
Theorem 2.3 we obtain the required convergence with respect to the Hcllingcr 
metric, which in our case is stronger than the convergence with respect to the 
metric || • H2, since densities fg are uniformly bounded for all 9 £ 0o,m- The 
proof of Theorem 3.1 is complete. □ 
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For general countable index sets /„. Theorem 2.1 yields adaptation up to a 
logarithmic factor. 

Theorem 3.2. Assume that X) 7 e/ A" < oo for some < a < 1. Let e nn = 

n - 7 /(2 7 +i) v / I 5^; j or dl 7 £ j n jj j o g c p [Q,l] with some j3 £ I n and 
|| log/o(2 ; )||oo < Gj M, then for all large constants r, 

n„{/ 8 : ||/fl-/o|| 2 >re n< p\X 1 ,...,X n }^0 

almost surely as n — > oo. 

Proof. Completely repeating the proof of Theorem 3.1, we obtain the condi- 
tions (1), (2) and (4) of Theorem 2.1. To see condition (3). note that e n 7 < 
£n,0 for 7 G J n and hence J„ j7 logn < ne n j < Hne n j3 . Since yj J nn £ n $ f=a 

1 3_ 

n 2(2-,+i) 2/3+1 f or a n 7) the proof of Theorem 3.1 yields that for some sufficiently 
large H and all large n, 

n„,4^( £n ^)) ~ (^yj^,^ " v ; v ; 

for all large j and any given i > which yields condition (3), and hence by 
Theorem 2.1 we conclude the proof of Theorem 3.2. □ 



4. Appendix 

Let be the space of all nonnegative integrable functions with the norm ||/||i. 
Write logO = — oo and 0/0 = 0. We shall adopt the Hausdorff a-entropy intro- 
duced by Xing and Ranneby in [10]. 

Definition 4.1. Let a > and Q c F. For S > 0, the Hausdorff a-entropy 
J(S,G,a,H,d) of the set Q relative to the prior distribution H and the distance 
d is defined as 

N 

J(S,g,a,U,d) = log M ^ 11(73, ) Q , 

7=1 

where the infimum is taken over all coverings {Bi, P>2, ■ ■ ■ , -Bat} of Q , where N 
may take the value oo, such that each Bj is contained in some ball 
{/ : d(f, fj) < 5} of radius S and center at fj £ L^. 

Note that it was proved in [10] that for any < a < 1 and Q C F, 

e j(s,g, a ,n,d) < n ^ja N ^ d y- a < N ^ ^ d) _ 

We begin with a lemma which is essentially given in the proof of Theorem 1 
of [9]. 
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Lemma 4.2. Let < a < 1, Q C F and D r£ = {f e G : d(f, f Q ) > re} with 
r > 2 and £ > 0. TTien we have 

E ( y R n (f)n(df)) a < e- / ( E " D -' Q: ' n » d )+ S * i ( r - 2 ) 2 " e2 . 

Proof of Lemma 4.2. Since £7 / i?„ (/) II(<2/ ) = n(£> re ) < 1, it suffices to 
prove the lemma for < a < 1. Given a constant > 1. by the defini- 
tion of J(e, -D re , a, II, g?) there exist functions /i, /2, . . . , fist in L^ such that 
Dre C UjLiS,-, where B j = D re n {/ : d(fj, f) < e} and £f =i n(B,) Q < 
^ e J(e,D„.Q,n,d)^ gy shrinking Bj if necessary, we may assume that all the sets 
Bj are disjoint and nonempty. Taking some gj G Bj we get that d(fj,fo) > 
d(gj,fo) - d{gj,fj) > (r - 1) e. Write 

/ n»(# ) = nn(flj) n ^^T- 

where / fcB » = J B . f(x)R k (f)IL n (df)/J B . R k (f)U n (df) and i? (/) = 1. The 
function f^Bj was introduced by Walker [8] and can be considered as the pre- 
dictive density of / with a normalized posterior distribution, restricted on the 
set Bj. So we have 



^/ fi ,(/,n W )"4n (8i r B (n^) 



j=l ^ k=0 

■ n-1 



<(j>e J(e,D re , a .n, d ) max E l TT 



i<i<Jv h{X k +i) a 

Since d(f,fj) s is a convex function of / and d(f,fj) < e, Jensen's inequal- 
ity implies that d(f kBj ,fj) < e for all k and hence d(f kBj ,f ) > d(fj,f ) - 
d{fj , f k B 3 ) > (r - 2) e. Using d(/ feBj , Jo) < H(f kBj , f ) and following the same 
lines as the proof of Theorem 1 in [9] , one can get that 

E ( J R n (f)U(df)Y < ^ e J(e,D r ^ a .n,d) + ^± { r-2)*ns^ 

which by the arbitrariness of </> > 1 concludes the proof of Lemma 4.2. □ 
Proof of Theorem 2.1. Denote D(e) = {/ : d(f,f ) > £>. Write 



f ii n (/)n„(d/)= V a„ >7 / i4(/)n„. 7 (df) 

JD(re„,ff n ) 7g/n ^„, 7 nD(re„ A ) 

= ]T a„ >7 / i?„(/)n„, 7 (d/) 
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E A «-7 / i?„(/)n„. 7 (d/) 
V a„. 7 f jz n (/)n n>7 (4f). 



7£J 2 J V n , 1 n{f:re„, l3n <d(fJ )<^=e n ,- l } 

Since < a < 1, it follows from the inequalities x < x a for < x < 1 and 
(a- + y) a < x a + y a for x, y > that 



U n (D{re n , 0n )\X 1 ,X 2 ,...,X,, 

E 



/ g( ^ /3ti) i? ra (/)n„(d/) 
/ F i?„(/)n„(d/) 

A", 7 bn^nDire^.Pn) ^(f) ^n,7(4f) 



" 7 fi / F i?„(/)n„(4f) 
Jr. / F iC(/)n„(d/) 

A «,7 /•P„,-,n{/:re„, /3 „<d(/,/o)<^%£„, 7 } i?II ^) nn '7( d /) 

^ / F i?„(/)n„(d/) 

„ A ™,7 (/'P„, T nZ5(r e „^„) i? ™(/) n «.7( d /) 

~ K,? n (/^ n i2»(/)n»,ft.(4f) 

_ A «.7 ( J"P„ 7 nL>(^=e„ ,) n ",7( rf /) 

+ E 7 ^ 

76/2 K A (j Vn0n Rn(f)Tl nA (df) 



^ A ™,7 /p„, T n{/:r £ „, fi , l <d(/,/o)<^e„, T } ii "^) ni ^7( d /) 



From n4,^ n > (1+^) log n it turns out that £^ =1 e" Cn£ -^ < £^° =1 ^/n 1+C < 
oo. Hence, by Lemma 1 of [9] and the first Borcl-Cantelli Lemma, we have that 

■Pn,g n 

almost surely for all large n. Thus, we obtain that 

n n (D(re n ,(3 n )\X 1 ,X 2 ,...,X n ) 



< y 

"4^ A- fc n nA (w nj/3n ( £ „,0j) 
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+ 7 ^ K^n,^{w n ^{e nA )) a 
\ nn eS 3+2C)nE ".^ J Vn yn{f:rErij3n < dUJo)< _i= en _ i} fl„(/)n n , 7 (d/) 
4^ An >^ ( ( £ ".^ )) 

:= On + &n + c n 
almost surely for all large n. Given 6 > 0, we have 

PS°{Tl n (D(re n ,p n ) \ X U X 2 , . . . ,X n ) > 6} < P °°{a n + b n + c n > 5} 

< P£°{a n > 5/3}+P™{b n > 5/3}+P£°{c n > 5/3} < | Ea n + ^ Eb n + ^ Ec n . 
It turns out from Fubini's theorem and condition (4) that 

oo 
n=l 

~ v A n , 7 e < 3+2C >"<^ U, hJ (V nn n {/ : re n , Pn < d(fj ) < ^e nn }) 

°° A P ( 3 + 2C ><4 n n (A (VF )) 

< y y ? , — ^ — - < °°- 

On the other hand, let [r] be the largest integer less than or equal to r and let 
D n ,~/,j = {.! € V nn ■ j£n,p„ < d{f, /o) < 2je n ^ n }. Then for any 7 e 7* we have 

OO 

n D(re n j )n ) C P„ >7 n D(Hen,3„) = |J 

and hence 

_ ~ K. 1 e^ c ^^E(j D i?„(/)n„. 7 (d/)) Q 



Since r > \f~H + 1, we have that je n ,i3 n > b"]£ra,-y/\Aff > £n, 7 for all 76 ^ and 
j > [r]. It then follows from Lemma 4.2, Lemma 1 of [10] and condition (1) that 

E ( f i?„(/)n„ 7 (d/)) Q < e J ( £f ^ &L ' A "'^ 2j ' s ™^^' a ' n "^' d ) + ^ lj ' 2 " £ ™.''n 
< n n , 1 {A n , 1 {2je n , 0n )) a N{^,A n , 1 {2e),d) 1 - a e^ n <^ 
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< n„, 7 (A n , 7 (2je„^J) a e E ^<i + ^ 2 ™^n. 
Thus, by the assumption that S 7 e^ < Ge\ a for 7 £ we have 



w <r ST ^ A^n„, 7 (A„, 7 ( 2j e„,,J) Q e( 3 ^ 2 ^ G +W)»4,,„ 



7 t^ ^] A "^„ n «^„ (W n A (e„ lftl )) 

which by condition (3) does not exceed 



00 



76^ i=M 



00 

= °( E e (J+G+3a+2QC+QLj2+ ^ J ' 2) " e ".< 3 ™) 

OO 

3=[r] 



q! e {J+G+3a+2aC)nef lil3 



2 e 



= 0^e (J+G+3a+2QC+QL[rl+i: ^ i[rl) " e "-fn^ = o(e _Cne ".> 3 ™) =0 



,1+C 



where the first equality follows from X^ 7 <=j M« 7 = 0(e e "^n), the third one 
from 1 - a > 18<xL, the next last one from r > lH£±J±£+pps£} + 1 an d the 

5 — 1 — a— ISaL 

last one from ne\ « > (1 + ^) log n. Therefore, we have that Y?m=i ^ a n < 00 . 
On the other hand, observe that e nn > \fHs n .fj n > £ n.p n for 7 £ I 2 . So, using 
the same argument as the above, one can get that 

00 

Eb n < E ^ 7 e {3a+2Ca+G)ne ".^ +{aLj2+ ^ : > 2)n£ "--' 

7 G/2 j=[r] 
00 

= °( E e ^ +G + 3Q+2CQ +^ 2 +W)"<^) = o(-^), 

i=W 

which yields that X^^Li < 00 ■ Thus, we have proved that 

00 

^P °°{n„(/: d(f,f ) >rE n , 0n \X 1 ,X 2 ,...,X n ) > 6} < oo, 

n=l 

and then by the first Borel-Cantelli Lemma we get that 

n„(/ : d(f, /„) > re n ,f3 n \ X lt X 2 , . . . , X n ) < S 
almost surely for all large n. The proof of Theorem 2.1 is complete. □ 
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Proof of Theorem 2.3. The proof of Theorem 2.3 is in fact a slight modification 
of the proof of Theorem 2.1. We only need to repeat the proof of Theorem 2.1 
except that we shall apply the following inequalities 

„ a„ i7 / PiiiinC ( rEiiiJ! j Rn{f) n„ i7 (df ) \ a 

' J ¥ Rn(f)H n (df) J 
U nri nD(re n ,,J R n(f)^nMdf)\ a 



and 



V ' LRn(f)U n ^df) J 



< 



/ Rn(f) n„, T (df) > U„ n (W nn (Ke n .pJ) e -(3+2C)Xn4,,„ 



'Tun 

The details of the proof of Theorem 2.3 are therefore omitted. □ 

Proof of Theorem 2.4- The first assertion of Theorem 2.4 follows from the proof 
of Theorem 2.1. The second assertion follows similarly by applying the partition 

U (J {/ G V nn : d(/,/ ) > re^J U |J {/ e V n>1 : d(/,/ ) < -4=£n, 7 } 

So we omit the details of the proof of Theorem 2.4. □ 
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